home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
AOL File Library: 4,401 to 4,500
/
aol-file-protocol-4400-4401-to-4500.zip
/
AOLDLs
/
PDA-Newton Development
/
ND+ NewtonScript ByteCode
/
bytcd.txt
next >
Wrap
Text File
|
2014-12-08
|
27KB
|
862 lines
NewtonScript ByteCode
Matthew Faupel
Draft 1
17 Nov 1994
Contents
1 Background to ByteCode 2
1.1 The Representation of Compiled NewtonScript 2
1.2 How Values are Encoded in NewtonScript 3
2 ByteCode 4
2.1 The General Format of ByteCode 4
2.2 Explanation of the Description of Individual ByteCodes 4
2.3 ByteCode Descriptions 5
Abstract
This technical report details my discoveries about NewtonScript bytecode.
It is not an official Apple document and as the information was discovered
by observation only, it may be neither accurate nor complete. This document
is distributed in the hope that it will be useful, but without any warranty;
without even the implied warranty of merchantability or fitness for a particular
purpose.
The discoveries detailed in this technical report cover how compiled NewtonScript
is represented on the Apple MessagePad platform, how values are encoded,
the general format of bytecode, and the behaviour of individual bytecodes.
The author is indebted to Jason Harper for his ViewFrame Demo program,
which made me realise that bytecode might not be as impenetrable as I first
thought. Any omissions or errors in this document are though, entirely
my responsibility.
Apple and Newton are trademarks of Apple Computer, Inc., registered in
the United States and other countries. MessagePad, NewtonScript and Newton
ToolKit are trademarks of Apple Computer, Inc. ViewFrame is a trademark
of Jason Harper.
1(null)(null) Background to ByteCode
In order to understand fully the description of NewtonScript bytecode given
in the second section of this report a number of things first need to be
made clear.
1.1 The Representation of Compiled NewtonScript
A compiled NewtonScript function is represented on the MessagePad as a
frame with the following fixed format:
{
class: 'CodeBlock,
instructions: <bytecode of type 'instructions>,
literals: [...],
argFrame: {
_nextArgFrame: {}
_Parent: {},
_implementor: {},
...
},
numArgs: <integer>
}
D class is always set to the symbol 'CodeBlock.
D instructions is a block of bytecode bytes, the meaning of which we will
get to a little later.
D literals is an array of all the literal values used in the NewtonScript
function (symbols, staticly defined frames, sub<#0106>functions and so
on).
D numArgs is the number of arguments to the function.
D argFrame consists of three sections:
S The first three items are always the fixed slots _nextArgFrame, _Parent
and _implementor.
S The next numArgs slots (i.e. possibly none if numArgs is zero) are arguments
of the function in the order that they appear in the source code.
S Any remaining slots (again possibly none) represent the local variables
declared within the function.
To give a concrete example, here is a short NewtonScript function and the
frame that represents it:
func( foo, bar ) {
begin class:
'CodeBlock,
local fred := '[]; instructions:
<instructions>,
/*
18 A5 7D 02 */
return fred
literals:
[ '[] ],
end argFrame:
{
_nextArgFrame:
{},
_Parent:
{},
_implementor:
{},
foo:
NIL,
bar:
NIL,
fred:
NIL
},
numArgs:
2
}
1.2 How Values are Encoded in NewtonScript
A knowledge of how NewtonScript encodes values is useful for understanding
some of the bytecodes. Certain simple values such as integers, and the
constants True and Nil are encoded as immediate values. More complex values
such as floats, icons, frames and arrays are encoded as separate data blocks
and then referred to via a pointer.
NewtonScript has a uniform way of encoding both the simple values and the
pointers to more complex values within a 32 bit field. It achieves this
by using the lowest two bits of the field as a type identifier. What the
value means for each of the four possible types is as follows:
00 The field represents a signed integer in the range -229 to +229-1.
i.e. if the low two bits of the 32 bit value are zero, the integer value
that it represents can be obtained by performing value >> 2.
01 The field represents a pointer to a complex structure. The pointer
value can be obtained by performing value & 0xFFFFFFFC.
10 The field represents a unique system value. As far as I'm aware
the possible values are:
0x00000002 Nil
0x0000001A True
0x000UUUU6 The Unicode character \uUUUU, e.g. the NewtonScript constant
$A translates to the immediate value 0x00000496.
11 The field represents a magic pointer. These are the constants
that are reperesented as @<number> and have names beginning with ROM_ in
the NTK definitions file. The number is encoded in the top 30 bits of
the field. As an example, ROM_asciishift (@7) is encoded as 0x0000001F,
i.e. ....000111 11.
2(null)(null) ByteCode
2.1 The General Format of ByteCode
All NewtonScript bytecodes are either one or three bytes long. If the
low three bits of the first byte in the sequence are all 1 then the following
two bytes are also part of the instruction, otherwise it is a single byte
instruction. The following are all examples of valid byte codes (shown
in hexadecimal):
00 73 5F 02 15 37
00 10
Very often (but not always) the low three bits are used as a count value
and if the required value exceeds 6 then all three bits are set and the
following two bytes are used instead for the value. Hence this initially
rather odd seeming format is actually quite useful for keeping bytecode
compact. The use of this count field is explained in more detail below.
2.2 Explanation of the Description of Individual ByteCodes
All of the bytecode descriptions given below follow the same format:
Octal value (Hex value) Name
Stack behaviour
Longer description.
The bytecodes are given as octal values first, because this makes it easier
to show the low three bits of the code as a separate digit. As mentioned
above, these three bits often act as a count field. When this is the case,
the letter `N' will be used instead of a digit and the role played by the
count field will be explained in the stack behaviour and description sections.
Remember that any bytecode listed as xxN can be either one or three bytes
long!
The hex value of the bytecode is also given as tools such as ViewFrame
tend to show raw data in hex format rather than octal. In the case of
bytecodes with a count field, the hex value given will be with the count
set to zero.
Bytecode is executed on a virtual machine that uses a stack of NewtonScript
values (see section 1.2), hence how each instruction modifies the stack
important. The stack behaviour is shown as:
{items removed from stack by execution} -> {items placed on stack after
execution}
The items are listed in the order that they are pushed onto the stack,
i.e. the leftmost is the first item pushed and the rightmost the last.
Repetition of items is indicated by braces (...) followed by a repetition
count. If an item on the stack must be of a certain type, the description
says so, otherwise placeholders such as <#007F>X" are used. References
to arrays, frames and symbols are indicated by [], {} and ' respectively.
An empty stack is indicated by the symbol <#00D8>.
2.3 ByteCode Descriptions
000 (00) pop
X -> <#00D8>
Removes the top item from the stack.
001 (01) dup
X -> X, X
Duplicates the top item on the stack.
002 (02) return
<#00D8> -> <#00D8>
Exits from a routine. By convention, the top item on the stack is the
return value of the routine. It is possible to return from a routine with
more items on the stack (or none at all), however the current Apple NewtonScript
compiler ensures that the functions that it produces always leave one and
only one value on the stack.
003 (03) self
<#00D8> -> self
Pushes a reference to the frame containing the current function onto the
stack. In other words this pushes the value represented by the NewtonScript
value self.
004 (04) Create closure (?)
{CodeBlock} -> {CodeBlock}
This bytecode is inserted by the compiler whenever it has just pushed a
CodeBlock frame either as a parameter to a function call or message send,
or as the target of a call...with statement. My guess is that this initialises
the three reserved fields in the argFrame of the CodeBlock (see 1.1), but
I haven't yet attempted to verify this.
005 (05) foreach next
[iterator] -> <#00D8>
Steps to next item in iteration. See 307 000 021 - foreach for a full
explanation.
006 (06) foreach complete
[iterator] -> Boolean
Checks if iteration complete. See 307 000 021 - foreach for a full explanation.
007 000 007 (07 00 07) End exception handling (?)
<#00D8> -> <#00D8>
See 31N - onexception for a full explanation.
01N (08) Apparently unused
02N (10) Apparently unused
03N (18) Push literal value
<#00D8> -> literal N
Pushes the literal which is element N of the literals array of this code
block onto the stack.
04N (20) Push immediate value
<#00D8> -> N
N here is treated as an immediate value as described in section 1.2. This
instruction is used to push any immediate value that can be represented
in 16 (sign extended) bits or less. Values that require the full 32 bits
are stored as literals and pushed using 03N.
Some examples of the use of this instruction:
20 push 0
22 push Nil
23 push @0
24 push 1
27 00 1A push True
27 02 86 push $(
27 FF F8 push -2
05N (28) Call global
(Arg)N, 'FunctionSymbol -> <#00D8>
N arguments to the function are expected on the stack (pushed in the same
order that they are listed in the function parameter list), followed by
a reference to the symbol that represents the name of the function (this
is stored in the literals array and pushed using 03N). Although the stack
description indicates that the instruction itself pushes nothing onto the
stack, remember that all NewtonScript functions conventionally push a single
item onto the stack before returning.
Example:
max( foo, bar ) translates to:
7B push foo (assuming it to be the first parameter of the function)
7C push bar (assuming it to be the second)
18 push 'max (assuming it to be the first literal in the function)
2A Call a global function with two arguments.
06N (30) Call ... with
(Arg)N, {004'd CodeBlock} -> <#00D8>
This directly translates the NewtonScript <#007F>call ... with ()" construct.
N arguments to the function are expected on the stack (pushed in the same
order that they are listed in the function parameter list), followed by
a reference to a CodeBlock frame that has had the 004 instruction called
on it (qv). As before, remember that all NewtonScript functions conventionally
push a single item onto the stack before returning.
Example:
call func(foo) return foo with (1) translates to:
24 push 1
18 push func (assuming it to be the first literal in the containing
function)
Note that inline function definitions like this are encoded as
a
CodeBlock frame in the literals array of the containing function.
04 Form a closure for the func (?)
31 Call ... with passing one argument.
07N (38) Send message
(Arg)N, {Receiver}, 'Message -> <#00D8>
This directly translates the NewtonScript <#007F>:" construct. N arguments
to the message are expected on the stack (pushed in the same order that
they are listed in the function parameter list), followed by a reference
to the receiving frame and then the symbol of the message. Again, remember
that all NewtonScript functions conventionally push a single item onto
the stack before returning.
Example:
:Open() translates to:
03 push self
18 push 'Open (assuming it to be the first literal in the function)
38 Send message with zero parameters.
10N (40) Conditional send message
(Arg)N, {Receiver}, 'Message -> <#00D8> or Nil
This directly translates the NewtonScript <#007F>:?" construct. N arguments
to the message are expected on the stack (pushed in the same order that
they are listed in the function parameter list), followed by a reference
to the receiving frame and then the symbol of the message to be sent if
the given function exists. If the function doesn't exist, Nil is placed
on the stack else it's left up to the function to put something there.
Example:
child:?DoItIfYouCan() translates to:
7B push child (assuming it to be the first parameter of the function)
18 push 'DoItIfYouCan (assuming it to be the first literal in the
function)
40 Send message with zero parameters if possible.
11N (48) Inherited send message
(Arg)N, 'Message -> <#00D8>
This directly translates the NewtonScript <#007F>inherited:" construct.
N arguments to the message are expected on the stack (pushed in the
same order that they are listed in the function parameter list), followed
by the symbol of the message. Again, remember that all NewtonScript functions
conventionally push a single item onto the stack before returning.
Example:
inherited:Open() translates to:
18 push 'Open (assuming it to be the first literal in the function)
48 Send message with zero parameters to inheritance parent.
12N (50) Conditional inherited send message
(Arg)N, 'Message -> <#00D8> or Nil
This directly translates the NewtonScript <#007F>inherited:?" construct.
N arguments to the message are expected on the stack (pushed in the same
order that they are listed in the function parameter list), followed by
the symbol of the message to be sent if the given function exists. If
the function doesn't exist, Nil is placed on the stack else it's left up
to the function to put something there.
Example:
inherited:?DoItIfYouCan() translates to:
18 push 'DoItIfYouCan (assuming it to be the first literal in the
function)
50 Send message with zero parameters to inheritance parent (if possible).
13N (58) Goto
<#00D8> -> <#00D8>
This instruction causes execution to pass to location N, where N is the
offset from the start of the block of bytecode (i.e. gotos are absolute,
not relative).
14N (60) Goto if not Nil
X -> <#00D8>
X is removed from the stack and examined. If it is Nil, execution continues
with the next instruction in sequence. If it is not Nil, execution passes
to location N where N is the offset from the start of the block of bytecode.
15N (68) Goto if Nil
X -> <#00D8>
X is removed from the stack and examined. If it is not Nil, execution
continues with the next instruction in sequence. If it is Nil, execution
passes to location N where N is the offset from the start of the block
of bytecode.
16N (70) Push external variable
<#00D8> -> value of external variable named by literal N
This bytecode pushes onto the stack the value contained in a variable or
slot from outside of the function. The name of the variable to push is
given by literal N in the function's literals array; this literal should
be a symbol reference.
Example:
length( functions ) translates to:
70 push value of functions (assuming 'functions is the first entry
in
the literals array of the containing CodeBlock)
C7 00 12 Return the length of functions.
17N (78) Push local variable
<#00D8> -> value of local variable in slot N of argFrame
This bytecode pushes onto the stack the value contained in the Nth slot
of the CodeBlock's argFrame frame. In other words, this bytecode is commonly
used to push either the value of an argument to the function or that of
a local variable.
Example:
func( foo ) return foo translates to:
73 push value of foo (remember argument slots start after the three
fixed
slots in argFrame, hence the index is 3 not 0).
02 Return
20N (80) Create frame
(SlotValue)N, [frameMap] -> {frame}
This bytecode creates a frame given N slot values on the stack plus a frame
map describing the slots in the frame. The frame map is an array whose
first item is Nil and then each subsequent item is a reference to a symbol
giving the name of the corresponding slot in the frame being created.
The values are pushed onto the stack in the order of the slot names in
the frame map. If this is as clear as mud, maybe the example will help:
Example:
local fr := { size: 4, count: 7 } translates to:
27 00 10 push 4
27 00 1C push 7
18 push frame map (assuming it to be the first literal in the CodeBlock)
The frame map is an array thus: [Nil, 'size, 'count]
82 Create a frame with two slots
A6 Store in fr (assumed to be the 6th slot in argFrame)
A final NB. The frame map array isn't a normal array of class Array.
Its class is a small integer, whose value seems to be made up from three
possible flags. I'm investigating the meaning of these flags at the moment.
21N (88) Create array
(Element)N, 'Class -> [Class: ...] or
Integer, 'Class -> [Class: ...] for 8F FF FF
This bytecode creates an array given N element values on the stack plus
a reference to a symbol defining the class of the array. The class of
the array is specified in NewtonScript with an initial <#007F><class>:"
inside the array, e.g.: [stepChildren:]. If no class is specified, then
the default class of Array is used.
In the special case of N being 0xFFFF, the instruction takes a single NewtonScript
integer from the stack instead of N element values and creates an empty
array (i.e. each element is Nil) with the given number of elements.
Example:
local foo := [ 4, 7 ] translates to:
27 00 10 push 4
27 00 1C push 7
18 push 'Array (assuming it to be the first literal in the CodeBlock)
8A Create an array with two elements
A5 Store in foo (assumed to be the 5th slot in argFrame)
221 (91) Push slot value
{frame}, 'SlotName -> slot value
This takes a frame and the name of a slot in that frame and pushes the
value contained in the slot. If the slot name is a single name then it
is represented by a symbol reference. If the slot name is a path expression
(e.g. foo.bar) then it is represented by an array of class pathExpr with
one element per section of the path. Each element is then a reference
to the symbol for that part of the path.
Example:
return self.foo.bar translates to:
03 push self
18 push '[pathExpr: 'foo, 'bar] (assuming it to be the first literal)
91 Get slot value
02 Return
230 (98) Assign to slot
{X}, 'SlotName, Y -> <#00D8>
This assigns the value Y to the slot SlotName of frame X, i.e. X.SlotName
:= Y.
As with 221 (push slot value) the slot name is represented either by a
simple symbol reference or a pathExpr array for a complex path expression.
231 (99) Assign to slot and push result
{X}, 'SlotName, Y -> Y
Identical to 230 except that the value assigned to the slot is also pushed
onto the stack.
24N (A0) Assign to local variable
X -> <#00D8>
Value X is assigned to the Nth slot of the CodeBlock's argFrame frame.
In other words, this bytecode is usually used to set the value of a local
variable.
Example:
local x := '[] translates to:
18 push '[] (assuming it to be the first literal)
A7 00 10 Assign to x (assuming it to be 16th slot in argFrame)
25N (A8) Assign to external variable
X -> <#00D8>
Value X is assigned to a variable or slot from outside of the function.
The name of the variable to assign to is given by literal N in the function's
literals array; this literal should be a symbol reference.
26N (B0) Increment local variable
Increment -> Increment, local variable N + Increment
This bytecode adds the given integer increment to the local variable in
the Nth slot of the CodeBlock's argFrame and the returns both the increment
and the new value of the local variable. This instruction is used internally
as part of the coding of for loops.
27N (B8) for loop goto
Increment, value, limit -> <#00D8>
This instruction is placed at the end of a for loop construct. It takes
the increment, current value of the counter (after having been incremented
by 26N) and the limit value of the loop. If the counter value now exceeds
the loop limit, execution continues with the next instruction in sequence.
If it does not, execution passes to location N where N is the offset from
the start of the block of bytecode. This instruction is used internally
as part of the coding of for loops.
300 (C0) Add
X, Y -> X+Y
The 30N bytecode series encodes a number of different low<#0106>level system
function calls. Any NewtonScript operator or built<#0106>in function not
explicitly mentioned here is implemented by using a standard function call
(bytecode 05N). This first bytecode of the series implements addition
(of integers or floats).
301 (C1) Subtract
X, Y -> X-Y
Subtraction of integers or floats.
302 (C2) Dereference array element
[X], Y -> X[Y]
This pushes the array element Y of array X. Y should be an integer.
303 (C3) Assign to array element and push
[X], Y, Z -> Z
This assignes value Z to the pushes the array element Y of array X i.e.
X[Y] := Z. Y should be an integer.
304 (C4) Comparison for equality
X, Y -> Boolean (X=Y)
This compares X and Y for equality and pushes a Boolean result (True or
Nil).
305 (C5) Not
X -> Boolean (not X)
This takes any value X and returns not X. Not Nil is True, not anything
else is Nil.
306 (C6) Comparison for inequality
X, Y -> Boolean (X<>Y)
This compares X and Y for inequality and pushes a Boolean result (True
or Nil).
307 000 007 (C7 00 07) Multiply
X, Y -> X*Y
This implements multiplication (of integers or floats).
307 000 010 (C7 00 08) Divide (with float result)
X, Y -> Float (X/Y)
This implements division (of integers or floats) with a float result, i.e.
it corresponds to the <#007F>/" operator in NewtonScript.
307 000 011 (C7 00 09) Divide (with integer result)
X, Y -> Integer (X div Y)
This implements division of integers with an integer result, i.e. it corresponds
to the <#007F>div" operator in NewtonScript.
307 000 012 (C7 00 0A) Compare less than
X, Y -> Boolean (X < Y)
This checks if X is less than Y and pushes a Boolean result (True or Nil).
307 000 013 (C7 00 0B) Compare greater than
X, Y -> Boolean (X > Y)
This checks if X is greater than Y and pushes a Boolean result (True or
Nil).
307 000 014 (C7 00 0C) Compare greater than or equal
X, Y -> Boolean (X >= Y)
This checks if X is greater than or equal to Y and pushes a Boolean result
(True or Nil).
307 000 015 (C7 00 0D) Compare less than or equal
X, Y -> Boolean (X <= Y)
This checks if X is less than or equal to Y and pushes a Boolean result
(True or Nil).
307 000 016 (C7 00 0E) Binary and
Integer X, integer Y -> Integer band(X,Y)
This performs a binary and of the two integers X and Y.
307 000 017 (C7 00 0F) Binary or
Integer X, integer Y -> Integer bor(X,Y)
This performs a binary or of the two integers X and Y.
307 000 020 (C7 00 10) Binary not
Integer X -> Integer bnot(X)
This performs a binary not of the integer X.
307 000 021 (C7 00 11) foreach
X, Boolean -> [Iterator]
This is the fundamental instruction used to implement all NewtonScript
foreach loops. If the Boolean is Nil it begins a normal foreach loop and
if it is True it begins a foreach deeply loop. The iterator returned is
then passed to the other two related instructions 005 (foreach next) and
006 (foreach complete). The iterator itself is an array containing various
useful state information about what is being iterated over and how far
the iteration has progressed.
005 (foreach next) takes the iterator and modifies it so that the next
item in the iteration becomes the current item. 006 (foreach complete)
checks the state of the iterator and returns True if all items have been
iterated over and Nil if not.
So, a foreach loop is implemented in bytecode as:
?? push thing being iterated over
27<#0102>00<#0102>1A/22 push True or Nil (for deeply or not)
C7 00 11 Create foreach iterator
A? Assign it to a local variable.
... Perform other initialisation (depends on the loop type)
5F ?? ?? Goto end of loop check (Y:)
X: ... Code executed each time round the iteration.
7? Push the local variable referencing the iterator
05 And move to the next thing to be iterated
Y: 7? Push the iterator again
06 And check if we've finished iterating
6F ?? ?? Go round the loop again if not (X:)
307 000 022 (C7 00 12) length
X -> Integer length(X)
This implements the length function.
307 000 023 (C7 00 13) clone
X -> clone(X)
This implements the clone function.
307 000 024 (C7 00 14) SetClass
X, 'Symbol -> modified X
This implements the SetClass function e.g. SetClass( data, 'Binary ).
X is a complex data structure (frame, array or data block); SetClass sets
the class of that structure and then returns a reference to the now modified
structure.
307 000 025 (C7 00 15) AddArraySlot
[X], Y -> Y
This implements the AddArraySlot function e.g. AddArraySlot( foo, 2 ).
Y is added as the last element of the array X.
307 000 026 (C7 00 16) Make string
[Array of strings] -> String
This takes an array of strings and combines them into a single string.
This function is used to implement the NewtonScript operators <#007F>&"
and <#007F>&&". <#007F>&&" is currently done by inserting an extra single
space string into the array of strings to be translated.
Example:
fred && "me" translates to:
70 push value of fred ('fred is the first literal)
19 1A 1B push " " (second), "me" (third) and 'Array (fourth)
8B Make an array from 3 parameters and a class symbol
C7 00 16 Make a string from the array
307 000 027 (C7 00 17) Slot exists
{X}, 'SlotName -> Boolean (True if X contains the given slot)
This implements the exists operator, but only when checking for the existance
of a slot in a frame (e.g. foo.bar exists). When checking for the existance
of a variable, a function call (bytecode 05N) to the function HasVar is
used instead.
307 000 030 (C7 00 18) ClassOf
X -> ClassOf(X)
This implements the ClassOf function.
31N (C8) onexception
('Exception, byte offset)N -> <#00D8>
This takes N pairs of an exception symbol and an offset into the bytecode
to jump to if that exception occurs. This is used to encode the high level
try ... onexception construct. Note that unlike all the goto bytecodes,
the offset is encoded as a standard NewtonScript integer. At the end of
both the normal and the exception code, the bytecode 007 000 007 is executed
- I guess it clears whatever state was set up.
Example:
try 1/0 onexception |evt.ex.div0| do nil becomes:
18 push '|evt.ex.div0| (first literal)
27 00 40 push integer 16
C9 onexception '|evt.ex.div0| goto 16
24 20 push 1, push 0
C7 00 08 1/0
07 00 07 End exception block (?)
5F 00 14 Goto 20 (i.e. the end of this block)
16: 22 Push Nil
07 00 07 End exception block (?)
320 - 377 (D0 - FF) Apparently unused
(null)